Statistics is at the core of data science. It is the logic, rules, & techniques. It is the lens we see and interpret data through it. It provides methods to collect, analyze, interpret, and present data effectively. Understanding statistical concepts is crucial for making data-driven decisions, identifying patterns, and drawing reliable conclusions from data - all essential skills for a data scientist.
This introductory course covers the fundamental concepts of statistics, from basic descriptive methods to probability theory then some inferential techniques. Students will learn how to summarize data, understand probability distributions, estimate parameters, and conduct hypothesis tests, building a strong foundation for more advanced statistical analysis.
Introduction to the field of statistics, basic terminology, statistical thinking, types of data, and the role of statistics in the scientific method and data analysis.
Methods for organizing, summarizing, and visualizing data including frequency distributions, histograms, scatter plots, and other graphical representations.
Calculating and interpreting measures of central tendency (mean, median, mode) and dispersion (range, variance, standard deviation) to describe data distributions.
Introduction to probability theory, rules of probability, conditional probability, Bayes' theorem, and probability as the foundation for statistical inference.
Understanding and applying discrete random variables and probability distributions including binomial, Poisson, and hypergeometric distributions.
Working with continuous random variables, with a focus on the normal distribution, standard normal distribution, and applications to real-world scenarios.
Techniques for selecting samples from populations, including simple random sampling, stratified sampling, cluster sampling, and sampling distributions.
Methods for estimating population parameters using point estimates and interval estimates, and understanding confidence levels and margins of error.
Framework for statistical hypothesis testing, null and alternative hypotheses, p-values, type I and type II errors, and conducting z-tests and t-tests.
Some of the concepts will seem useless practically -and that may be right- however they still play a key role in understanding more complex concepts and proofs in more advanced topics in statistics and machine learning. For example, understanding distributions will help you understand Linear Regression (in Stat 3), and afterwards several Machine Learning models that are based on maximum likelihood estimation.